Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for connection pool usage v3 #307

Merged
merged 11 commits into from
May 24, 2024

Conversation

realglebivanov
Copy link
Contributor

@realglebivanov realglebivanov commented May 21, 2024

Hello!

Since these two PRs seem to be abandoned

#306
#292,

I've decided to open another one that will hopefully be merged.

@realglebivanov realglebivanov requested a review from josevalim May 22, 2024 07:40
@josevalim
Copy link
Member

The pool ID should be the name of the Ecto repo, no? My biggest concern is that those costs will be paid by everyone in every operation, regardless of the sampling or need. So I think we should try polling first and fallback to telemetry only if that doesn’t work. Or put the telemetry in this PR behind some periodic action.

@realglebivanov
Copy link
Contributor Author

realglebivanov commented May 22, 2024

@josevalim
Hey, thanks for the review!
I fixed the issue you had found but the tests are a little bit ugly.
Hopefully, we'll find the correct solution.

@josevalim
Copy link
Member

Thank you, this looks great! I think we just need to move it to the DBConnection module now, which is the public API. But perhaps, to do so, we need a callback in the pool and implement it in the ownership pool too? 🤔

@realglebivanov
Copy link
Contributor Author

Fair enough but this will introduce a breaking change into the API.
Every custom pool there is will have to implement the new callback.
Also, it would probably be better to move this to DBConnection.Pool as it seems to be a part of the API as well

@realglebivanov
Copy link
Contributor Author

Actually, it isn't a part of the public API but I guess it should be a part of both DBConnection.Pool and DBConnection.

@realglebivanov realglebivanov changed the title Add a few queue telemetry calls to DBConnection.ConnectionPool Add metrics for connection pool usage v3 May 22, 2024
@josevalim
Copy link
Member

Yes, the pool API is private, so we are fine (but it would be fine to add new callbacks to behaviours nonetheless).

@realglebivanov
Copy link
Contributor Author

realglebivanov commented May 23, 2024

@josevalim I've managed to come up with a sketch of what the public API might look like. Please, take a look!

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, beautifully done! Just one last question: do you prefer to return a list of metrics (as in the PR) or return a single metric map where we sum (and/or average) the metrics from the ownership pool?

@realglebivanov
Copy link
Contributor Author

Thanks, José!
I'd rather leave it as it is if you don't mind.
I just wanted the end user to decide in which way it would be convenient to aggregate the metrics.

Comment on lines 108 to 129
{:ok, pool_metrics} = DBConnection.ConnectionPool.get_connection_metrics(pool)

proxy_metrics =
owners
|> Enum.map(fn {_, {proxy, _, _}} ->
if Process.alive?(proxy) do
GenServer.call(proxy, :get_connection_metrics)
end
end)
|> Enum.reject(&is_nil/1)

{:reply, {:ok, pool_metrics ++ proxy_metrics}, state}
catch
:exit, reason ->
if log do
Logger.log(
log,
"Caught :exit while calling :get_connection_metrics due to #{inspect(reason)}"
)
end

{:reply, :error, state}
Copy link
Member

@josevalim josevalim May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{:ok, pool_metrics} = DBConnection.ConnectionPool.get_connection_metrics(pool)
proxy_metrics =
owners
|> Enum.map(fn {_, {proxy, _, _}} ->
if Process.alive?(proxy) do
GenServer.call(proxy, :get_connection_metrics)
end
end)
|> Enum.reject(&is_nil/1)
{:reply, {:ok, pool_metrics ++ proxy_metrics}, state}
catch
:exit, reason ->
if log do
Logger.log(
log,
"Caught :exit while calling :get_connection_metrics due to #{inspect(reason)}"
)
end
{:reply, :error, state}
{:ok, pool_metrics} = DBConnection.ConnectionPool.get_connection_metrics(pool)
proxy_metrics =
owners
|> Enum.map(fn {_, {proxy, _, _}} ->
try do
GenServer.call(proxy, :get_connection_metrics, :infinity)
catch
:exit, _ when log ->
if log do
Logger.log(
log,
"Caught :exit while calling :get_connection_metrics due to #{inspect(reason)}"
)
end
nil
end
end)
|> Enum.reject(&is_nil/1)
{:reply, {:ok, pool_metrics ++ proxy_metrics}, state}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the suggestion. The poll is linked to the current process, so we don't have to worry about it crashing, cause it would crash us too. There is a chance of timeout, but we can also just pass :infinity as the timeout (which I recommend anyway).

The only pending question is: do you want to return {:ok, metrics} or just the metrics directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't return just metrics because it's the part of DBConnection.Pool behavior.
By the way, do we really want the client of get_connection_metrics to wait indefinitely for a response or handle timeouts on its own?

Copy link
Contributor Author

@realglebivanov realglebivanov May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we've gotten rid of all the potential errors. We can probably get rid of maybeish type in the public API as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, do we really want the client of get_connection_metrics to wait indefinitely for a response

Why would we wait indefinitely? A timeout is a safe default if we have blocking code but that's not the case here. The whole pool is design to queue things immediately instead of blocking.

Copy link
Contributor Author

@realglebivanov realglebivanov May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the GenServer.call calls with timeout = :infinity which in theory could cause indefinite wait times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you mean but I didn't know all the ins and outs of :gen and that's why the first decision was not to risk with :infinity.
Anyway, I've updated the code.

@realglebivanov realglebivanov requested a review from josevalim May 24, 2024 11:28
lib/db_connection.ex Outdated Show resolved Hide resolved
@josevalim josevalim merged commit ab2a75b into elixir-ecto:master May 24, 2024
2 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@realglebivanov
Copy link
Contributor Author

❤️ i'm a contributor now 🤣

@josevalim
Copy link
Member

And not an easy feature to add, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants